1 Overview

Data

Normalization

Model options

Network processing

cor/cov method: pearson

2 Load and normalize data

Data

Normalization

2.1 Load TCGA data

See prepare.tcga.survival.data function for how to configure parameters.

This uses data packages from obtained from ‘github:averissimo/tcga’ that contain FPKM expression levels.

prepare.tcga.survival.data('prad.data.2018.10.11',
                           'primary.solid.tumor',
                            normalization        = 'center',
                            log2.pre.normalize   = TRUE,
                            handle.duplicates    = 'keep_first',
                            coding.genes         = TRUE,
                            subtract.surv.column = '')

2.1.1 Summary of data

## [INFO]        Loaded data from TCGA: prad.data.2018.10.11
## [INFO]               type of tissue: primary.solid.tumor
## [INFO]   observations (individuals): 445 (10 event / 435 censored)
## [INFO]            variables (genes): 19850

2.1.2 Survival curve

2.2 Load degree data

Network processing

  • Degree calculation: string
  • Edge cutoff: 0
  • Unweighted edges

cor/cov method: pearson

  • only applies if network type is correlation/covariance

2.3 Preparing degree vector

  • Normalize degree between 0 and 1
  • Hub: heuristic( 1 - degree )
  • Orphan: heuristic( degree )
  • trans.fun is a double power to scale the values

see ?glmSparseNet::heuristicScale or ?glmSparseNet::hubHeuristic

# see ?glmSparseNet::heuristic.scale
trans.fun <- function(x) {
    heuristicScale(x) + 0.2
}

2.3.1 Genes with degree above 4000

ensembl_gene_id degree external_gene_name
ENSG00000150991 10394 UBC
ENSG00000170325 6346 PRDM10
ENSG00000100300 5797 TSPO
ENSG00000174775 5614 HRAS
ENSG00000142208 5573 AKT1
ENSG00000111640 5363 GAPDH
ENSG00000163631 5260 ALB
ENSG00000177606 4952 JUN
ENSG00000141510 4936 TP53
ENSG00000197122 4903 SRC
ENSG00000254647 4592 INS
ENSG00000112062 4187 MAPK14
ENSG00000170315 4114 UBB
ENSG00000170345 4010 FOS

2.3.2 Original degree frequency

Original

LOG10 Scale on X-axis

3 Model Inference

3.1 Train and test sets

## [INFO] Size of sets: (size/events)
##  * Train: 80.00% ::  356 /    8
##  *  Test: 20.00% ::   89 /   2
## [INFO] Number of variables per model:
Model BaseModel Alpha TargetVars nvars
Elastic Net Elastic Net 0.60 3 4
Hub Elastic Net 0.60 3 3
Orphan Elastic Net 0.60 3 3
Elastic Net Hub 0.10 13 14
Hub Hub 0.10 13 13
Orphan Hub 0.10 13 14
Elastic Net Orphan 0.10 5 6
Hub Orphan 0.10 5 5
Orphan Orphan 0.10 5 5
## [INFO] note, selected variables could be slightly different from target, to have more accuracy increase nlambda in code

4 Results

4.1 Relative risk distribution

Calculated using the inferred models and the train/test datasets. The higher the better.

Test Set

Train Set

4.2 Kaplan-Meier Curves

4.2.1 Models on test sets

Elastic Net (base model: Elastic Net)

Test set

Training set

Hub (base model: Elastic Net)

Test set

Training set

Orphan (base model: Elastic Net)

Test set

Training set

Elastic Net (base model: Hub)

Test set

Training set

Hub (base model: Hub)

Test set

Training set

Orphan (base model: Hub)

Test set

Training set

Elastic Net (base model: Orphan)

Test set

Training set

Hub (base model: Orphan)

Test set

Training set

Orphan (base model: Orphan)

Test set

Training set

4.3 Summary Table

weighted penalization project tissue cutoff coding.genes alpha model nvars km.train km.test c.index.train c.index.test
FALSE string prad.data.2018.10.11 primary.solid.tumor 0 TRUE 0.60 Elastic Net 4 0.0082306 0.1434727 0.8864542 0.1860465
FALSE string prad.data.2018.10.11 primary.solid.tumor 0 TRUE 0.60 Hub 3 0.0783487 0.9784762 0.6932271 0.3488372
FALSE string prad.data.2018.10.11 primary.solid.tumor 0 TRUE 0.60 Orphan 3 0.2083740 0.4942273 0.8390093 -1.0000000
FALSE string prad.data.2018.10.11 primary.solid.tumor 0 TRUE 0.10 Elastic Net 14 0.0039942 0.1868517 0.9063745 0.3023256
FALSE string prad.data.2018.10.11 primary.solid.tumor 0 TRUE 0.10 Hub 13 0.0292004 0.9784762 0.7898406 0.4651163
FALSE string prad.data.2018.10.11 primary.solid.tumor 0 TRUE 0.10 Orphan 14 0.0434207 0.1100129 0.9143426 0.7209302
FALSE string prad.data.2018.10.11 primary.solid.tumor 0 TRUE 0.10 Elastic Net 6 0.0054774 0.1868517 0.8605578 0.2093023
FALSE string prad.data.2018.10.11 primary.solid.tumor 0 TRUE 0.10 Hub 5 0.0344144 0.9596664 0.7360558 0.3837209
FALSE string prad.data.2018.10.11 primary.solid.tumor 0 TRUE 0.10 Orphan 5 0.0260847 0.5987003 0.8624128 0.6119403

4.4 Non-zero genes

4.4.1 Venn Diagram of selected genes

4.4.1.1 Overlap between models (with same target number of variables)

Base model: Elastic Net with Target Vars: 3 and Alpha: 0.60

Base model: Hub with Target Vars: 13 and Alpha: 0.10

Base model: Orphan with Target Vars: 5 and Alpha: 0.10

Table of genes in models

(not running at the moment)

5 5000 Runs

5.1 C-Index distribution

Test set

Summary of C-Index and Log-rank

metric model base.model target.vars alpha mean std median min
C-Index (Test set) Hub Elastic Net 3 0.60 0.337 0.678 0.557 -1
C-Index (Test set) Hub Hub 13 0.10 0.328 0.680 0.554 -1
C-Index (Test set) Hub Orphan 5 0.10 0.295 0.683 0.472 -1
C-Index (Test set) Elastic Net Elastic Net 3 0.60 -0.214 0.864 -1.000 -1
C-Index (Test set) Elastic Net Hub 13 0.10 0.542 0.344 0.619 -1
C-Index (Test set) Elastic Net Orphan 5 0.10 0.091 0.778 0.397 -1
C-Index (Test set) Orphan Elastic Net 3 0.60 -0.254 0.819 -1.000 -1
C-Index (Test set) Orphan Hub 13 0.10 0.570 0.358 0.655 -1
C-Index (Test set) Orphan Orphan 5 0.10 0.377 0.524 0.554 -1
Log-rank (Test set) Hub Elastic Net 3 0.60 0.447 0.340 0.305 0.0143
Log-rank (Test set) Hub Hub 13 0.10 0.453 0.344 0.305 0.00855
Log-rank (Test set) Hub Orphan 5 0.10 0.453 0.347 0.304 0.0101
Log-rank (Test set) Elastic Net Elastic Net 3 0.60 0.439 0.264 0.406 0.000311
Log-rank (Test set) Elastic Net Hub 13 0.10 0.421 0.314 0.298 0.0126
Log-rank (Test set) Elastic Net Orphan 5 0.10 0.471 0.290 0.378 0.00701
Log-rank (Test set) Orphan Elastic Net 3 0.60 0.475 0.260 0.495 0.000608
Log-rank (Test set) Orphan Hub 13 0.10 0.433 0.320 0.290 0.0244
Log-rank (Test set) Orphan Orphan 5 0.10 0.464 0.325 0.317 0.0143

C-Index Distribution

Rank

Elastic Net vs Hub
base.model r.squared
Elastic Net with target nvars: 3 alpha: 0.60 0.0376236
Hub with target nvars: 13 alpha: 0.10 0.1037840
Orphan with target nvars: 5 alpha: 0.10 0.0666755

Elastic Net vs. Orphan
base.model r.squared
Elastic Net with target nvars: 3 alpha: 0.60 0.6595686
Hub with target nvars: 13 alpha: 0.10 0.4239360
Orphan with target nvars: 5 alpha: 0.10 0.1549383

Hub vs. Orphan
base.model r.squared
Elastic Net with target nvars: 3 alpha: 0.60 0.0140233
Hub with target nvars: 13 alpha: 0.10 0.1039123
Orphan with target nvars: 5 alpha: 0.10 0.0916150

Train set

Summary of C-Index and Log-rank

metric model base.model target.vars alpha mean std median min
C-Index (Train set) Hub Elastic Net 3 0.60 0.766 0.085 0.777 0.398
C-Index (Train set) Hub Hub 13 0.10 0.812 0.050 0.812 0.63
C-Index (Train set) Hub Orphan 5 0.10 0.700 0.066 0.705 0.445
C-Index (Train set) Elastic Net Elastic Net 3 0.60 0.764 0.074 0.781 0.472
C-Index (Train set) Elastic Net Hub 13 0.10 0.878 0.040 0.878 0.741
C-Index (Train set) Elastic Net Orphan 5 0.10 0.767 0.076 0.769 0.5
C-Index (Train set) Orphan Elastic Net 3 0.60 0.781 0.057 0.792 0.519
C-Index (Train set) Orphan Hub 13 0.10 0.867 0.048 0.877 0.651
C-Index (Train set) Orphan Orphan 5 0.10 0.750 0.136 0.788 0.298
Log-rank (Train set) Hub Elastic Net 3 0.60 0.066 0.072 0.038 0.000407
Log-rank (Train set) Hub Hub 13 0.10 0.021 0.032 0.011 0.000271
Log-rank (Train set) Hub Orphan 5 0.10 0.094 0.080 0.077 6e-04
Log-rank (Train set) Elastic Net Elastic Net 3 0.60 0.355 0.243 0.308 0.00251
Log-rank (Train set) Elastic Net Hub 13 0.10 0.025 0.030 0.012 0.000303
Log-rank (Train set) Elastic Net Orphan 5 0.10 0.284 0.268 0.193 0.000734
Log-rank (Train set) Orphan Elastic Net 3 0.60 0.358 0.240 0.306 1.54e-13
Log-rank (Train set) Orphan Hub 13 0.10 0.058 0.080 0.040 0.000671
Log-rank (Train set) Orphan Orphan 5 0.10 0.197 0.208 0.098 0.000965

C-Index Distribution

C-Index Rank

5.2 Log-rank test on Kaplan-Meier models

5.2.1 Log-Rank distribution

Cumulative

Distribution of Log-rank test with groups separated by high and low risk groups

Distribution

Distribution of Log-rank test with groups separated by high and low risk groups

5.2.2 Log-Rank Rank

5.2.2.1 Elastic Net vs Hub

base.model r.squared
Elastic Net with target nvars: 3 alpha: 0.60 0.0214725
Hub with target nvars: 13 alpha: 0.10 0.0371328
Orphan with target nvars: 5 alpha: 0.10 0.0113305

5.2.2.2 Elastic Net vs. Orphan

base.model r.squared
Elastic Net with target nvars: 3 alpha: 0.60 0.2178867
Hub with target nvars: 13 alpha: 0.10 0.0000101
Orphan with target nvars: 5 alpha: 0.10 0.0058999

5.2.2.3 Hub vs. Orphan

base.model r.squared
Elastic Net with target nvars: 3 alpha: 0.60 0.0005798
Hub with target nvars: 13 alpha: 0.10 0.0004029
Orphan with target nvars: 5 alpha: 0.10 0.0005280

5.3 Consesus genes

5.3.1 Selected genes over all runs

Bar plots of frequencies

Overlap between variables selected

Base model: Elastic Net with Target Vars: 3 and Alpha: 0.60

Base model: Hub with Target Vars: 13 and Alpha: 0.10

Base model: Orphan with Target Vars: 5 and Alpha: 0.10

5.3.2 Selected genes in significant runs

i.e. with pvalue < 0.05 in Log-rank test using the test set.

Bar plots of frequencies

Overlap between variables selected

5.3.2.0.1 Base model Elastic Net with target nvars: 0.60 alpha: 3

5.3.2.0.2 Base model Hub with target nvars: 0.10 alpha: 13

5.3.2.0.3 Base model Orphan with target nvars: 0.10 alpha: 5

5.3.3 Genes in Significant runs

i.e. with pvalue < 0.05 in Log-rank test using the test set.

Top consensus 20 genes

Gene Overlap total
ENSG00000278674 2 571
PRR27 2 571
GAGE2A 2 395
DEFA1 2 241
UBC 1 218
SRC 1 218
PRDM10 1 184
INS 1 137
CDK2 1 125
AKT1 1 101
UMPS 1 65
TP53 1 61
RAD51 1 49
GAPDH 1 48
MYC 1 44
CSN2 2 38
ASPDH 2 38
FOS 1 36
AC068946.1 2 33
UBB 1 31

All Genes

ENSG00000278674(571), PRR27(571), GAGE2A(395), DEFA1(241), UBC(218), SRC(218), PRDM10(184), INS(137), CDK2(125), AKT1(101), UMPS(65), TP53(61), RAD51(49), GAPDH(48), MYC(44), CSN2(38), ASPDH(38), FOS(36), AC068946.1(33), UBB(31), CABS1(29), MIA-RAB4B(29), CDK1(28), LIPN(26), PRKCG(26), MAPK3(25), POTEI(25), CACNG3(24), NUTM2F(20), LHX5(19), NRAS(17), PCSK9(16), NKX1-2(14), ELOVL3(14), HSP90AA1(13), CCDC150(12), AC006486.1(11), KRTAP19-3(10), NUTM2F(10), GSK3A(9), HRAS(8), SLC5A10(7), KRTAP19-7(7), LHX5(7), ACTRT2(7), SLC25A47(6), PCSK9(6), CACNG3(5), OR2T35(5), OR2T2(5), FBXO47(5), ITGA2B(5), RPS6KB1(5), OR2T35(5), TPTE(4), SPINK6(3), MTCP1(3), GAS2(3), PCDHA8(3), MAGEC3(2), OR2W5(2), TARM1(2), AC104581.2(2), RHOA(2), PRSS53(2), FGF11(2), OR2T2(2), CDK3(2), MAGEA9B(2), ENSG00000183791(1), CDC42(1), MAPK1(1), MYB(1), THRSP(1), OR4S1(1), KRTAP10-8(1), ASPDH(1), POM121L12(1), NKX1-2(1), KRTAP19-3(1), PRSS51(1), AC104581.2(1), OR1S1(1)

5.4 Best C-index test in all runs

Best Elastic Net model (C-Index test)

[INFO] Coefs. list

[1] “CACNG3, DEFA1, LHX5, OR4S1”

Test set

Full set

Best Hub model (C-Index test)

[INFO] Coefs. list

[1] “AKT1, CDK2, PRDM10, SRC, UBC”

Test set

Full set

Best Orphan model (C-Index test)

[INFO] Coefs. list

[1] “ENSG00000278674, GAGE2A, PRR27”

Test set

Full set

5.5 Best Log-rank test in all runs

Best Elastic net (Log-rank test)

[INFO] Coefs. list

[1] “DEFA1, ENSG00000278674, PRR27”

Test set

Full set

Best Hub model (Log-rank test)

[INFO] Coefs. list

[1] “CDK2, FOS, GAPDH, INS, MAPK3, MYC, PRDM10, PRKCG, RAD51, SRC, TP53, UBC, UMPS”

Test set

Full set

Best Orphan model (Log-rank test)

[INFO] Coefs. list

[1] “DEFA1, ENSG00000278674, GAGE2A, PRR27”

Test set

Full set

NULL

6 Parameters for the report

## [INFO]          balanced.sets: TRUE
## [INFO]        calc.params.old: FALSE
## [INFO]           coding.genes: TRUE
## [INFO]     degree.correlation: pearson
## [INFO]          degree.cutoff:       0.000
## [INFO]            degree.type: string
## [INFO]      degree.unweighted: TRUE
## [INFO]      handle.duplicates: keep_first
## [INFO]                   log2: TRUE
## [INFO]                n.cores:      14.000
## [INFO]          normalization: center
## [INFO]                 ntimes:    5000.000
## [INFO]                project: prad.data.2018.10.11
## [INFO]                   seed:    1985.000
## [INFO]                 subset: Inf
## [INFO]   subtract.surv.column: (I do not know how to display this)
## [INFO]            target.vars: list(alpha = 0.6, vars = 3), list(alpha = 0.1, vars = 13), list(alpha = 0.1, vars = 5)
## [INFO]                 tissue: primary.solid.tumor
## [INFO]                  train:       0.800